adding positional encoder changes and tests #32600

manuelsh · 2024-08-11T17:25:28Z

@amyeroberts as there were some conflicts with merging with main on #31900 (possibly due to the make scripts), I have reimplemented all the changes of #30783 in a new branch, which is rebased to main.

manuelsh · 2024-08-12T16:56:18Z

@amyeroberts I have included the interpolation of positional embeddings in all the following models, and their respective tests:

altclip
bridgetower
chineseclip
clip
clipseg
kosmos_2
x_clip
git

Thanks!

amyeroberts

Thanks for adding - looks great!

Just a handful of small nits. Before merge, we'll need to run the slow tests for the models affected. Could you trigger this by running git commit --allow-empty -m "[run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip"

amyeroberts · 2024-08-14T11:36:05Z

tests/models/x_clip/test_modeling_x_clip.py

+                model(**inputs, interpolate_pos_encoding=False)
+        # forward pass
+        with torch.no_grad():
+            outputs = model(**inputs, interpolate_pos_encoding=True)


amyeroberts · 2024-08-14T11:36:50Z

tests/models/git/test_modeling_git.py

+    @unittest.skip(reason="GitForCausalLM does not support inputs_embeds in generate method")
+    def test_inputs_embeds_matches_input_ids_with_generate(self):
+        pass
+


Let's remove this as this logic is independent of this PR

Suggested change

@unittest.skip(reason="GitForCausalLM does not support inputs_embeds in generate method")

def test_inputs_embeds_matches_input_ids_with_generate(self):

pass

If I remove this, I will get the following error from the CI pipeline:

FAILED tests/models/git/test_modeling_git.py::GitModelTest::test_inputs_embeds_matches_input_ids_with_generate - ValueError: You passed `inputs_embeds` to `.generate()`, but the model class GitForCausalLM doesn't have its forwarding implemented. See the GPT2 implementation for an example (https://github.com/huggingface/transformers/pull/21405), and feel free to open a PR with it!

as shown here

Could you rebase on main? I believe this has been resolved upstream

This should still be removed as this tests is unrelated to this PR

amyeroberts · 2024-08-14T11:39:33Z

src/transformers/models/clip/modeling_clip.py

        return_dict (`bool`, *optional*):
            Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple.
+


Suggested change

amyeroberts · 2024-08-14T11:39:59Z

src/transformers/models/clipseg/modeling_clipseg.py

@@ -512,6 +526,8 @@ def _init_weights(self, module):
        output_hidden_states (`bool`, *optional*):
            Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for
            more detail.
+        interpolate_pos_encoding (`bool`, *optional*):


Suggested change

interpolate_pos_encoding (`bool`, *optional*):

interpolate_pos_encoding (`bool`, *optional*, defaults to `False`):

amyeroberts · 2024-08-14T11:40:07Z

src/transformers/models/clipseg/modeling_clipseg.py

@@ -549,6 +565,8 @@ def _init_weights(self, module):
        output_hidden_states (`bool`, *optional*):
            Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for
            more detail.
+        interpolate_pos_encoding (`bool`, *optional*):


Suggested change

interpolate_pos_encoding (`bool`, *optional*):

interpolate_pos_encoding (`bool`, *optional*, defaults to `False`):

…smos2, x_clip

manuelsh · 2024-08-15T22:47:06Z

OK, I've:

added your three suggestions (thanks!)
I haven't removed the @unittest.skip(reason="GitForCausalLM does not support inputs_embeds in generate method")... lines, as per my comment, please let me know
I have run the slow tests with the git command sent. However, I don't think it is running the right slow tests as I just detected some errors on them which I am fixing now (for example in the "bridgetower" one)

Please don't merge yet, just need some time to check and potentially fix the tests.

manuelsh · 2024-08-16T20:37:53Z

GIT model test still requires to be fixed. Getting this error:

tests.models.git.test_modeling_git.GitModelIntegrationTest.test_inference_interpolate_pos_encoding failed with error: <class 'IndexError'> index out of range in self
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/unittest/case.py", line 59, in testPartExecutor
    yield
  File "/usr/local/lib/python3.10/unittest/case.py", line 591, in run
    self._callTestMethod(testMethod)
  File "/usr/local/lib/python3.10/unittest/case.py", line 549, in _callTestMethod
    method()
  File "/usr/src/app/transformers/tests/models/git/test_modeling_git.py", line 588, in test_inference_interpolate_pos_encoding
    outputs = model(**inputs, interpolate_pos_encoding=True)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/src/app/transformers/src/transformers/models/git/modeling_git.py", line 1302, in forward
    embedding_output = self.embeddings(
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/src/app/transformers/src/transformers/models/git/modeling_git.py", line 115, in forward
    embeddings = self.word_embeddings(input_ids)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 164, in forward
    return F.embedding(
  File "/usr/local/lib/python3.10/site-packages/torch/nn/functional.py", line 2267, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self

due to input_ids having values out of range (tensor([[49406, 768, 568, 530, 518, 2867, 49407]], dtype=torch.int32)). In concrete 49406 and 49407 are not accepted. Not sure why the processor is adding them.

Still on it.

amyeroberts · 2024-08-19T10:49:53Z

@manuelsh Have you included the most recent updates from main?

…smos2, x_clip

manuelsh · 2024-09-15T14:13:33Z

@amyeroberts done, all interpolate_pos_encoding functions updated.

amyeroberts

Beautiful - thanks for adding this capability to our models and for iterating on a solution!

amyeroberts · 2024-09-16T18:58:44Z

@manuelsh Just the failing slow tests to address!

manuelsh · 2024-09-16T22:30:48Z

@amyeroberts , I think it is not just substituting interpolate_pos_encoding function, but one needs to adapt it, as the position_embeddings tensor from #33226 is different from the position_embedding object in the code of this PR (note the s).

I believe I can make them work with

self.position_embeddings = self.position_embedding.weight.unsqueeze(0)

but now all my tests are crashing for different reasons (different tensors outputs for example) and this will take longer.

Why not getting back to the previous working commit (d44e070), merge it, and then open another PR like #33226 but for the clip family models?

I would be happy to contribute to it.

manuelsh · 2024-09-18T19:47:24Z

@amyeroberts I was able to fix all tests with the new function, so no need to do an additional PR. Please review.

amyeroberts · 2024-09-23T11:14:44Z

@manuelsh OK, great. Just the merge conflict to resolve an a final slow run to confirm everything passes now.

…smos2, x_clip

manuelsh · 2024-09-24T15:13:44Z

@amyeroberts I have resolved the conflicts, run the tests in slow, corrected one test in clipseg model not related to this PR that was not working test_inference_image_segmentation, and requested another slow run. Hopefully that's it!

manuelsh · 2024-09-25T09:28:23Z

@amyeroberts there were two tensors to correct in clipseg. Done now. If you can kindly approve the run for clipseg slow tests.

amyeroberts · 2024-09-25T18:04:57Z

@manuelsh Done and all passing - let's merge! Thanks for this addition and patience on iterating on this :)

manuelsh · 2024-09-25T18:08:13Z

Fantastic! Glad to see it. Thank you!

* adding positional encoder changes and tests * adding ruff suggestions * changes added by python utils/check_copies.py --fix_and_overwrite * removing pos_encoding added by script * adding interpolation to clipseg * formatting * adding further testing to altclip and better documentation to kosmos2 * skipping test_inputs_embeds_matches_input_ids_with_generate in git model * fixing clipseg comment suggestions * [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip * fixing bridgetower test * fixing altclip tensor output POS test * adding ruff formatting * fixing several tests * formatting with ruff * adding positional encoder changes and tests * adding ruff suggestions * changes added by python utils/check_copies.py --fix_and_overwrite * removing pos_encoding added by script * adding interpolation to clipseg * formatting * adding further testing to altclip and better documentation to kosmos2 * skipping test_inputs_embeds_matches_input_ids_with_generate in git model * fixing clipseg comment suggestions * fixing bridgetower test * fixing altclip tensor output POS test * adding ruff formatting * fixing several tests * formatting with ruff * adding right pretrained model * [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip * fixing test_inference_image_segmentation * [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip * fixing test_inference_interpolate_pos_encoding for the git model as there is no vision_model_output * [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip * adding ruff formatting * [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip * adding new interpolate_pos_encoding function * [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip * fixing interpolate_POS funciton * adapting output tensor in teests * [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip * modifying output tensor * [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip * adding the correct tensor * [run_slow] clipseg * fixing spaces * [run_slow] clipseg * [run_slow] clipseg --------- Co-authored-by: Manuel Sanchez Hernandez <[email protected]>

mcmonkey4eva · 2024-10-25T13:54:27Z

src/transformers/models/clipseg/modeling_clipseg.py

+        batch_size, _, height, width = pixel_values.shape
+        if not interpolate_pos_encoding and (height != self.image_size or width != self.image_size):
+            raise ValueError(
+                f"Input image size ({height}*{width}) doesn't match model" f" ({self.image_size}*{self.image_size})."


Was this tested? I'm getting user reports of error Input image size (352*352) doesn't match model (224*224). regardless of what size of image is put in - I'm assuming somewhere in ClipSeg code it resizes to 352 - which makes sense considering docs seem to indicate that 352 is an expected value https://huggingface.co/docs/transformers/model_doc/clipseg#transformers.CLIPSegForImageSegmentation.forward.example while indeed also showing that image_size is 224 https://huggingface.co/docs/transformers/model_doc/clipseg#transformers.CLIPSegVisionConfig.image_size

ie: using ClipSeg exactly as documented will necessarily throw this error.

The original clipseg model built by huggingface staff defines image_size 224 in config https://huggingface.co/CIDAS/clipseg-rd64-refined/blob/main/config.json#L127
and the preprocessor size as 352 https://huggingface.co/CIDAS/clipseg-rd64-refined/blob/main/preprocessor_config.json#L20

So either this check is wrong, or official configs have been wrong for years, or there's meant to be some handling of the sizes between the two that's gone missing

EDIT: Posted issue #34415

You need to add interpolate_pos_encoding=True in the signature when calling the model as discussed in the issue #34415.

Manuel Sanchez Hernandez added 5 commits August 11, 2024 19:21

adding positional encoder changes and tests

63e8a34

adding ruff suggestions

bf6ddf2

changes added by python utils/check_copies.py --fix_and_overwrite

c1e5058

removing pos_encoding added by script

19aaa92

adding interpolation to clipseg

b282796

manuelsh marked this pull request as ready for review August 11, 2024 23:53

Manuel Sanchez Hernandez added 3 commits August 12, 2024 17:50

formatting

14d6001

adding further testing to altclip and better documentation to kosmos2

48128b1

skipping test_inputs_embeds_matches_input_ids_with_generate in git model

8eb1beb

This was referenced Aug 13, 2024

Interpolate clip #31900

Closed

fixes clip interpolate #30783

Closed

amyeroberts added the run-slow label Aug 14, 2024

amyeroberts reviewed Aug 14, 2024

View reviewed changes

Manuel Sanchez Hernandez added 2 commits August 15, 2024 23:58

fixing clipseg comment suggestions

7ced086

[run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, ko…

cac7886

…smos2, x_clip

Manuel Sanchez Hernandez added 5 commits August 16, 2024 00:47

fixing bridgetower test

a17b554

fixing altclip tensor output POS test

c4e56fb

adding ruff formatting

e303547

fixing several tests

ee8318d

formatting with ruff

20778a3

manuelsh and others added 6 commits August 21, 2024 00:48

Merge branch 'huggingface:main' into interpolate-clip-b

da9108a

adding positional encoder changes and tests

024ea6e

adding ruff suggestions

9c645e3

changes added by python utils/check_copies.py --fix_and_overwrite

19ad494

removing pos_encoding added by script

b383517

adding interpolation to clipseg

578411c

Manuel Sanchez Hernandez added 2 commits September 15, 2024 16:09

adding new interpolate_pos_encoding function

ea54d25

[run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, ko…

9d751a6

…smos2, x_clip

amyeroberts approved these changes Sep 16, 2024

View reviewed changes

Manuel Sanchez Hernandez added 2 commits September 18, 2024 15:36

fixing interpolate_POS funciton

f36537b

adapting output tensor in teests

4170cba

Manuel Sanchez Hernandez added 4 commits September 24, 2024 10:53

fixing conflict to merge

5c593bc

[run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, ko…

d00d7b3

…smos2, x_clip

modifying output tensor

44f9695

[run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, ko…

d70c2b3

…smos2, x_clip

Manuel Sanchez Hernandez added 5 commits September 25, 2024 11:20

adding the correct tensor

299b979

[run_slow] clipseg

55572b4

fixing spaces

d121d89

[run_slow] clipseg

7afedcf

[run_slow] clipseg

3be2b60

amyeroberts merged commit a55adee into huggingface:main Sep 25, 2024
18 checks passed

manuelsh mentioned this pull request Sep 25, 2024

Community contribution: enable dynamic resolution input for more vision models. #30579

Open

11 tasks

ArthurZucker mentioned this pull request Oct 4, 2024

PaliGemma (and probably SigLIP) inference broken in latest transformers version #33929

Closed

mcmonkey4eva reviewed Oct 25, 2024

View reviewed changes

NielsRogge mentioned this pull request Oct 25, 2024

[CLIPSeg] Make interpolate_pos_encoding default to True #34419

Merged

neggles mentioned this pull request Oct 27, 2024

ClipSeg broken #34415

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adding positional encoder changes and tests #32600

adding positional encoder changes and tests #32600

manuelsh commented Aug 11, 2024

manuelsh commented Aug 12, 2024 •

edited

Loading

amyeroberts left a comment

amyeroberts Aug 14, 2024

amyeroberts Aug 14, 2024

manuelsh Aug 15, 2024

amyeroberts Aug 16, 2024

amyeroberts Aug 22, 2024

amyeroberts Aug 14, 2024

amyeroberts Aug 14, 2024

amyeroberts Aug 14, 2024

manuelsh commented Aug 15, 2024 •

edited

Loading

manuelsh commented Aug 16, 2024

amyeroberts commented Aug 19, 2024

manuelsh commented Sep 15, 2024

amyeroberts left a comment

amyeroberts commented Sep 16, 2024

manuelsh commented Sep 16, 2024 •

edited

Loading

manuelsh commented Sep 18, 2024

amyeroberts commented Sep 23, 2024

manuelsh commented Sep 24, 2024

manuelsh commented Sep 25, 2024

amyeroberts commented Sep 25, 2024

manuelsh commented Sep 25, 2024

mcmonkey4eva Oct 25, 2024 •

edited

Loading

manuelsh Oct 25, 2024

	@unittest.skip(reason="GitForCausalLM does not support inputs_embeds in generate method")
	def test_inputs_embeds_matches_input_ids_with_generate(self):
	pass

		return_dict (`bool`, optional):
		Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple.

	interpolate_pos_encoding (`bool`, optional):
	interpolate_pos_encoding (`bool`, optional, defaults to `False`):

adding positional encoder changes and tests #32600

adding positional encoder changes and tests #32600

Conversation

manuelsh commented Aug 11, 2024

manuelsh commented Aug 12, 2024 • edited Loading

amyeroberts left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

manuelsh commented Aug 15, 2024 • edited Loading

manuelsh commented Aug 16, 2024

amyeroberts commented Aug 19, 2024

manuelsh commented Sep 15, 2024

amyeroberts left a comment

Choose a reason for hiding this comment

amyeroberts commented Sep 16, 2024

manuelsh commented Sep 16, 2024 • edited Loading

manuelsh commented Sep 18, 2024

amyeroberts commented Sep 23, 2024

manuelsh commented Sep 24, 2024

manuelsh commented Sep 25, 2024

amyeroberts commented Sep 25, 2024

manuelsh commented Sep 25, 2024

mcmonkey4eva Oct 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

manuelsh commented Aug 12, 2024 •

edited

Loading

manuelsh commented Aug 15, 2024 •

edited

Loading

manuelsh commented Sep 16, 2024 •

edited

Loading

mcmonkey4eva Oct 25, 2024 •

edited

Loading